22 research outputs found

    Fast, Accurate and Detailed NoC Simulations

    Get PDF
    Network-on-Chip (NoC) architectures have a wide variety of parameters that can be adapted to the designer's requirements. Fast exploration of this parameter space is only possible at a high-level and several methods have been proposed. Cycle and bit accurate simulation is necessary when the actual router's RTL description needs to be evaluated and verified. However, extensive simulation of the NoC architecture with cycle and bit accuracy is prohibitively time consuming. In this paper we describe a simulation method to simulate large parallel homogeneous and heterogeneous network-on-chips on a single FPGA. The method is especially suitable for parallel systems where lengthy cycle and bit accurate simulations are required. As a case study, we use a NoC that was modelled and simulated in SystemC. We simulate the same NoC on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a speed-up of 80-300, without compromising the cycle and bit level accuracy

    Using an FPGA for Fast Bit Accurate SoC Simulation

    Get PDF
    In this paper we describe a sequential simulation method to simulate large parallel homo- and heterogeneous systems on a single FPGA. The method is applicable for parallel systems were lengthy cycle and bit accurate simulations are required. It is particularly designed for systems that do not fit completely on the simulation platform (i.e. FPGA). As a case study, we use a Network-on-Chip (NoC) that is simulated in SystemC and on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a factor 80-300 of speed improvement, without compromising the cycle and bit level accuracy

    Implementation of a 2-D 8x8 IDCT on the Reconfigurable Montium Core

    Get PDF
    This paper describes the mapping of a two-dimensional inverse discrete cosine transform (2-D IDCT) onto a wordlevel reconfigurable Montium Processor. This shows that the IDCT is mapped onto the Montium tile processor (TP) with reasonable effort and presents performance numbers in terms of energy consumption, speed and silicon costs. The Montium results are compared with the IDCT implementation on three other architectures: TI DSP, ASIC and ARM

    Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

    Get PDF
    Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

    An optimal architecture for a DDC

    Get PDF
    Digital down conversion (DDC) is an algorithm, used to lower the amount of samples per second by selecting a limited frequency band out of a stream of samples. A possible DDC algorithm consists of two simple cascading integrating comb (CIC) filters and a finite input response (FIR) filter preceded by a modulator that is controlled with a numeric controlled oscillator (NCO). Implementations of the algorithm have been made for five architectures, two application specific integrated circuits (ASIC), a general purpose processor (GPP), a field programmable gate array (FPGA), and the Montium tile processor (TP). All architectures are functionally capable of performing the algorithm. The differences between the architectures are their performance, flexibility and energy consumption. In this paper, we compared the energy consumption of the architectures when performing the DDC algorithm. The ASIC is the best solution if digital down conversion is constantly required. When digital down conversion is needed only parts of the time, the Altera Cyclone II is the best solution due to its smaller technology size. In the spare time, the reconfigurable architectures can be reconfigured for other tasks of today's multimedia devices

    Complexity analysis for mapping a DRM receiver on a heterogeneous tiled architecture

    Get PDF
    In this article we present the results of partitioning the OFDM baseband processing of a DRM receiver into smaller independent processes. Furthermore, we give a short introduction into the relevant parts of the DRM standard. Based on the number of multiplications and additions we can map individual processes on a heterogeneous multitile architecture. This architecture can meet both the computational demands as well as the restricted energy budget

    Routing of guaranteed throughput traffic in a network-on-chip

    Get PDF
    This paper examines the possibilities of providing throughput guarantees in a network-on-chip by appropriate traffic routing. A source routing function is used to find routes with specified throughput for the data streams in a streaming multiprocessor system-on-chip. The influence of the routing algorithm, network topology and communication locality on the routing performance are studied. The results show that our method for providing throughput guarantees to streaming traffic is feasible. The communication locality has the strongest influence on the routing performance while the routing algorithm has weakest influence. Therefore, the mapping algorithm is of greater importance for the system performance than the routing algorithm and it is profitable to use a more complex mapping algorithm that preserves the communication locality together with a simple routing algorithm

    An Automated Design-flow for FPGA-based Sequential Simulation

    Get PDF
    In this paper we describe the automated design flow that will transform and map a given homogeneous or heterogeneous hardware design into an FPGA that performs a cycle accurate simulation. The flow replaces the required manually performed transformation and can be embedded in existing standard synthesis flows. Compared to the earlier manually translated designs, this automated flow resulted in a reduced number of FPGA hardware resources and higher simulation frequencies. The implementation of the complete design flow is work in progress.\u

    Run-time mapping of applications to a heterogeneous reconfigurable tiled system on chip architecture

    Get PDF
    This work evaluates an algorithm that maps a number of communicating processes to a heterogeneous tiled system on chip (SoC) architecture at run-time. The mapping algorithm minimizes the total amount of energy consumption, while still providing an adequate quality of service (QoS). A realistic example is mapped using this algorithm
    corecore